(Durinck et al. 2009) (Durinck et al. 2005) (Davis and Meltzer 2007) (Huber et al. 2015) (R Core Team 2022) (Sanghi, Gruber, and Metwally 2021) (Law et al. 2016) (Robinson, DJ, and Smyth 2010) (McCarthy, Y, and Smyth 2012) (Chen, Lun, and Smyth 2016) (McCarthy, Y, and Smyth 2012) (Morgan 2021) (Kolberg et al. 2020) (Gu, Eils, and Schlesner 2016)
The data set I selected from GEO (Gene Expression Omnibus) was “APOE4 Causes Widespread Molecular and Cellular Alterations Associated with Alzheimer’s Disease Phenotypes in Human iPSC-Derived Brain Cell Types” conducted by the lab in MIT on Illumina HiSeq 2000 platform with GEO identifier GSE102956. This research compares the three controls(APOE3) and three tests(APOE4) iPSC-Derived Brain Cell to find whether APOE4 variant of APOE gene will cause Alzheimer’s Disease.
In the first part of the assignment, the data set was first cleaned by checking duplication and removing the low gene counts and outliers. Then the mixed NCBI gene ids and HGNC ids are merged and mapped into HGNC id. Finally, the data was normalized by TMM to be better used in the future.
In the second part of the assignment, we calculate the differential expression of the genes by categorizing them into two types, test and control. Then, we calculate their differential p value to find the significantly differentiated genes and identify them being up-ragulated ot down-regulated through their fold change. Finally, we use G:profiler a thresholded gene set enrichment analysis to see the pathways where these significantly differentiated up-regulated genes and down-regulated genes are participate in.
In the third part of the assignment, we will run a non-thresholded gene set enrichment analysis with GSEA and visualize the result with Cytoscape and analysis these pathways.
We run the non-thresholded Gene set Enrichment Analysis with GSEA program (4.3.2) (Subramanian et al. 2005). The gene set database we use is the Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol from Bader lab. The ranked gene list are generated in assignment 2 with all genes ranked by their fold_change times significance.
knitr::include_graphics("./image/GSEA_exe.png")
Figure 1. Run GSEA on Pre-ranked Gne List Panel. Use Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol from Bader lab for database and ranked gene generated in assignment 2
The result returned by GSEA is as following.
-3016 / 5126 gene sets are up-regulated
-733 gene sets are significant at FDR < 25%
-339 gene sets are significantly enriched at nominal p-value < 1%
-691 gene sets are significantly enriched at nominal p-value < 5%
-The top pathway is CADHERIN SIGNALING PATHWAY with NOM p-value of 0.000
knitr::include_graphics("./image/Geneset_enriched_positive.png")
Figure 2. Geneset Enriched Positive Top20. Up-regualted pathways with CADHERIN SIGNALING PATHWAY as the top result.
2110 / 5126 gene sets are down-regulated
737 gene sets are significantly enriched at FDR < 25%
335 gene sets are significantly enriched at nominal pvalue < 1%
601 gene sets are significantly enriched at nominal pvalue < 5%
-The top pathway is MITOCHONDRIAL TRANSLATION INITIATION with NOM p-value of 0.000
knitr::include_graphics("./image/Geneset_enriched_negative.png")
Figure 3. Geneset Enriched Negitive Top20.Down-regualted pathways with MITOCHONDRIAL TRANSLATION INITIATION as the top result.
Compare the results from Non-thresholded enrichment analysis with GSEA to threshholded enrichment analysis with G:profiler.
In the result from G:profiler, there are 82 pathways with p-value <= 0.05 in the significantly differentiated up-ragualted genes. In the result form GSEA, there are 691 pathways with p-value <= 0.05 are up-regualted. In the result from G:profiler, there are 3 pathways with p-value <= 0.05 in the significantly differentiated up-ragualted genes. In the result form GSEA, there are 601 pathways with p-value <= 0.05 are down-regualted.
If we compare the two methods quality wise, even though the number of pathways returned by GSEA are significantly more than G:profiler, the overall quality of the results are both promising. The reason why GSEA returns more pathways than G:profiler is that GSEA analysis also include the genes that are not significantly differentiated in the tests and controls. Thus, these non-significantly differentiated genes will compose more pathways together with significantly differentiated genes in both up and down regulated genes.
Since we are interested in the Alzheimer’s disease target genes and pathways, thus we are more interested in the pathways that are related to neural transmitting. The top result from GSEA up-regulated genes is CADHERIN SIGNALING PATHWAY which is the pathway that highly involved in cell-cell interactions such as neural signal transmitting. The top result form G:profiler up-regulated genes is neuron migration, which is also highly involved in neurons.
These two method are both insightful in enrichment analysis. However, this is not a straight forward comparison. As I mentioned above, GSEA analysis also include the genes that are not significantly differentiated in the tests and controls, while G:profiler only consider genes that are significantly differentiated. In other words, these two methods are performed on the same foundation.
Use Enrichment map app in Cytoscape to visualize the results generated by GSEA.
In network, there are 482 nodes, 4387 egdes.
The parameters that I used are q-value node cutoff = 0.05 and edge cut-off = 0.5
knitr::include_graphics("./image/EM_overall.png")
Figure 4. Publication Ready Figure. Generated by Enrichment map app in Cytoscape with 482 nodes, 4387 egdes at q-value node cutoff = 0.05 and edge cut-off = 0.5. Red nodes are up-regulated pathway and blues are down-regualted pathways
knitr::include_graphics("./image/EM_Cluster_upreg.png")
Figure 5. Big Up-regulated Cluster in EM. Pathways that are involved are highly related to synapse.
knitr::include_graphics("./image/EM_Cluster_downreg.png")
Figure 6. Big Down-regulated Cluster in EM. Pathways that are involved are highly related to cell cyles.
knitr::include_graphics("./image/Auto_annotated.png")
Figure 7. AutoAnnotate Network by AutoAnnotate app within Cytoscape with default parameters.
knitr::include_graphics("./image/AutAnnotation_para.png")
Figure 8. AutoAnnotate Default Parameters
knitr::include_graphics("./image/Collapse_network.png")
Figure 9. Collapsed Network generated by AutoAnnotate app within Cytoscape that collapse the pathways that are functionally related or similar.
I chose the SYNAPSE ORGANIZATION pathway GO:0050808 to investigate more detail, because this pathways is up-regulated in the test group, which aligns with the discoveries in the paper. And more importantly, this pathway contains the APOE gene which is the center of our analysis. According to GO, “this pathway is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of a synapse, the junction between a neuron and a target (neuron, muscle, or secretory cell).” In the original paper where these RNASeq data was published. Researchers observed increased miniature excitatory postsynaptic current (mEPSC) frequencies with indistinguishable mEPSC amplitudes in APOE4 neurons compared to APOE3 controls suggesting increased release of neurotransmitter or elevated synaptic density in APOE4 neurons.(Lin et al. 2018) And “increased synaptic activity has been shown to correlate with increased Ab production”(Bero et al. 2011). Ab production is believed to be one of the the causes of Alzheimer’s disease. Therefore, the up-regulation of SYNAPSE ORGANIZATION pathway potentially caused by APOE4 gene variant can feasibly increase the synapses density causing Ab elevation thus contributing to the Alzheimer’s disease, which is essentially what we are discovering in this assignment.
The depth of colour are determined by ranked value.
knitr::include_graphics("./image/Gene_Network.png")
Figure 10. The SYNAPSE ORGANIZATION pathway gene network by GeneMANIA in Cytoscape. Nodes are coloured by the data type they are from. The depth of colour are determined by ranked value.
As I demonstrated in the third part of the assignment, the GSEA returned a very promising result that strongly support the mechanisms discussed in the original paper. GSEA result showed that the APOE4 (test groups) variant genes demonstrates an up-regulation of synapse related pathways. And in the paper, researchers also discovered elevation of synaptic density in APOE4 neurons, which has been shown to correlate with increased Ab production (Lin et al. 2018). And Ab production is believed to be one of the the causes of Alzheimer’s disease (Bero et al. 2011). This is essentially what we are discovering in this assignment: The correlation between APOE4 and the Alzheimer’s disease.
If we compare the result from G:profiler and GSEA, even though the number of pathways returned by GSEA are significantly more than G:profiler, the overall quality of the results are both promising. The reason why GSEA returns more pathways than G:profiler is that GSEA analysis also include the genes that are not significantly differentiated in the tests and controls. Thus, these non-significantly differentiated genes will compose more pathways together with significantly differentiated genes in both up and down regulated genes.
Since we are interested in the Alzheimer’s disease target genes and pathways, thus we are more interested in the pathways that are related to neural transmitting. The top result from GSEA up-regulated genes is CADHERIN SIGNALING PATHWAY which is the pathway that highly involved in cell-cell interactions such as neural signal transmitting. The top result form G:profiler up-regulated genes is neuron migration, which is also highly involved in neurons.
These two method are both insightful in enrichment analysis. However, this is not a straight forward comparison. As I mentioned above, GSEA analysis also include the genes that are not significantly differentiated in the tests and controls, while G:profiler only consider genes that are significantly differentiated. In other words, these two methods are performed on the same foundation.